A Survey on Spark Ecosystem: Big Data Processing Infrastructure, Machine Learning, and Applications
نویسندگان
چکیده
منابع مشابه
A survey of machine learning for big data processing
There is no doubt that big data are now rapidly expanding in all science and engineering domains. While the potential of these massive data is undoubtedly significant, fully making sense of them requires new ways of thinking and novel learning techniques to address the various challenges. In this paper, we present a literature survey of the latest advances in researches on machine learning for ...
متن کاملSpark-BDD: Debugging Big Data Applications
Apache Spark has become a key platform for Big Data Analytics, yet it lacks complete support for debugging analytics programs. As a result, the development of a new analytical toolkit can be a painstakingly long process [7, 2, 4]. To fill this gap, we are developing Spark-BDD (Big Data Debugger), which brings a traditional interactive debugger experience to the Spark platform. Analytic programm...
متن کاملA Survey of Machine Learning Methods for Big Data
Nowadays there are studies in different fields aimed to extract relevant information on trends, challenges and opportunities; all these studies have something in common: they work with large volumes of data. This work analyzes different studies carried out on the use of Machine Learning (ML) for processing large volumes of data (Big Data). Most of these datasets, are complex and come from vario...
متن کاملBenchmarking Apache Spark with Machine Learning Applications
We benchmarked Apache Spark with a popular parallel machine learning training application, Distributed Stochastic Gradient Descent for Matrix Factorization [5] and compared the Spark implementation with alternative approaches for communicating model parameters, such as scheduled pipelining using POSIX socket or MPI, and distributed shared memory (e.g. parameter server [13]). We found that Spark...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering
سال: 2020
ISSN: 1041-4347,1558-2191,2326-3865
DOI: 10.1109/tkde.2020.2975652